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1. INTRODUCTION 


A. GENERAL REMARKS 

CD-ROMs (Compact Disc Read Only Memories) provide computer software 
applications developers with intriguing possibilities of making hundreds of megabytes, 
even gigabytes of data readily accessible to personal computer users. Such massive 
storage capacitv opens up new realms of potential applications for microcomputer- 
software developers. 

The CD-ROM has a thousand times the storage capacity of a floppy disk. In the 
computer industry, we often improve things by a factor of two or three and the new 
applications are considered evolutionary. But a one thousandfold increase in storage 
Eaeacitv enables us to create rich and multifaceted new applications. (Gates, 1986. p. 
X1) 

Furthermore, a floppy disk can store only a few seconds of full motion, full 
Screen color video, whereas a single CD can store as much as an hour of such video 
images. The floppy can store only three seconds of high-quality audio, but the CD can 
store an hour. I[t is this remarkable power of the CD-ROM disc to digitally store video 
images, audio, data, and computer code in any combination that emphasizes its vast 
potential. 

CD-ROM technology is derived from CD audio technology and uses the same 
basic drive mechanisms and disc manufacturing processes. Because of tlus close 
relationship, CD-ROM player and disc development has benefitted directly from the 
technological advances and cost reductions associated with the rapid growth of the CD 


audio industry. (Einberger, 1987, p. 31) 


B. THE TLOCD SYSTEMI 

iemsacion ledeer om Compact Disc (TLOC®) is the culmination of a U.S. 
Psy esUpported thesis project conducted in the spring of 1987 at the Naval 
Postgraduate School in Monterey, California. [t involved the transfer of some 
2.000.000 records containing historical transaction data from a magnetic tape medium 
to a CD-ROM disc. The records represented all transactions conducted by the Naval 
SuopimeCenter at Oakland, California, for the months of October and November 1986. 


Miiewrecords were arranged into three types of files according to their particular 
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application. The “Transaction” fies contained data about conducted transactions such 


as ordering and issuing. 


The “Closing Balance” files contain such informationmaas 
quantity on hand and quantity on order. The “Audit Trail” files consist of pertinent 
data about previous transactions. 

Reference Technology Inc. of Boulder, Colorado, was tasked with transferring 
the data, creating the indexes, and pressing the disc. They also provided the system 
software to interface between IBM compatible personal computers and the CLASIAX 
Datadrive Serics 500 disc player manufactured by Hitachi. A list of the hardware and 


software initially utilized by the TLOCD system can be found in Table 1. 





Ae 
TULOCD TARDW AKI WAND SORTWARE CONTIGO sor 


Zenith Z-248 PC (IBM PC/AT Compatible) with 
-20 Mby teewiIneciestrer Drive 

-1 360K Double-sided, double-density 

-5§ 1/4 inch floppy disk drive 

-640K RAM 

-~Intel’s 80286 16-bit Microprocessor 

-8 MHZ Systems Clock 


Zenith RGB/ENHANCED COLOR MONITOR 


CLASIX®™ pataDrive'™ Series 500 
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The evolution of the TLOCD system attenipts to identify an alteriauaaae 
alleviate the over commitment of currently installed TANDEM systems at thememan 
Naval Supply Centers. The systems are saturated with the Transaction Ledger on Disk 


(TLOD) database--thus precluding the system from being utilized for more productive 
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fons | OCD allows the user to query data in much the same way as the TLOD 
meeeen), [lie only diiference is im the niore effective CD-ROM storage medium used bv 
me oCD. llowever, the user never actually has to know whether the data is stored bv 


eonventional means or whether it resides on a CD-ROM. 


oe OBJECTIVES 

Unless the file structures for a CD-ROM application are designed carefully, the 
application’s performance is likely to suffer. Typically, poor CD-ROM performance is 
the result of file-structure design that reflects “magnetic-disk think.” Application 
designers often tend to apply rules of thumb learned from working with magnetic 
media. Instead, one needs to focus on the unique strengths and weaknesses of the CD- 
Pee. (Zoellick, 1986, p. 177) 

[t is the purpose of this paper to examine these strengths and weaknesses in the 
areas of indexing, file management, and application software issues and to make 
recommendations to be considered by future Navy research and development in miss 
Storage applications. Additionally, the feasibility and adaptability of CD-ROM 
technology into L.S. Navy environments will be addressed. The TLOCD prototype will 


be referenced throughout this report. 
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Il. CD-ROM OVERVIEW 


A. GENERAL REMARKS 

CD-ROM enjoys tremendous leverage based from the success of digital audio. 
Both products use the same 12 centimeter plastic disc for storing data, and both 
emplov the same basic manufacturing and playback technologies. CD-ROM. thus 
benefits from the volume-related cost savings that have driven down the prices of 
digital audio and made it so popular and affordable. 

The raw specifications of CD-ROM are staggering. A single 4.72 inch disc stores 
530 megabytes of data, the equivalent of 1,500 floppy disks or 28 20-megabyte hard 
disks. That is 250.000 pages--500 books--whole encyclopedias. Yet any piece of 
information on the dise can be located and displaved in two or three seconds. (DeTray, 
1986, p. 4) 


B. PHYSICAL FORMAT 

The CD-ROM's physical format is defined by a standard developed by the 
Philips and Sony corporations and is an extension of their compact digital audio disc 
standard. However, this digital audio parentage also constrains the CD-ROM to an 
unimpressive random-seek performance. In particular, the underlying digital audio 
format results in a data format that is based on constant linear velocity (CLV) 
recording. 

Most magnetic disks use constant angular velocity (CAV) format. Figure 2.1 
shows the sector organization of a typical magnetic disk. Note that the sectors on the 
inner tracks are smaller than those on the outer tracks. This is because CAV is 
another way of saying constant rotational speed. With a CAV format, the linear 
velocity of the disk surface relative to the disk head is greater on the outer tracks Witene 
the disk’s circumference ts greater. The outer sectors are also phvsicallv larger. 

Figure 2.2 illustrates the CLV sector format of a CD-ROM. The relative spcedma 
the disc surface and dise head stays the samme. even as the head moves away [ronmaee 
center of the disc. A CD-ROM drive maintains this constant linear velocity by actually 
changing the dise’s rotational speed as the head moves from track to track. |l1¢sgi 
format results in sectors of equal length. The actual number of sectors encountered in a 


single disc rotation ranges from about nine on the inside of the disc to about 20 on the 


STIRACKIOISECTOR 1 
| TRACK 1, SECTOR 0 
| TRACK 0, SECTOR 0 





source: BYTE, May 1986. 


Figure 2.1 Sector Organization of a CAV Magnetic Disk. 


outer edge. Therefore, recording must be done in a spiral rather than in a series of 
concentric rings. Recording begins at the inside of the disc and spirals outward. 

The great advantage that CAV recording has over the CD-ROM’s CLV format 1s 
that the CAV organization makes it casicr to find the beginning of a particular sector. 
Suppose one wants to jump to a specific sector relative to the start of a file. With a 
CAV format, where each track contains a fixed number of sectors, It 1s very casy to 
translate this relative sector number into an absolute track and sector address, given 
the track and sector address of the start of the file. 

There is no simple, fixed relationship between a CLV track and the number of 
sectors on the track. Therefore, translating a relative sector number into an absolute 
track and sector address is more complicated. In addition, head movement must be 
accompanied by the inechanical process of speeding up or slowing down the rotational 
speed of the disc. Together these account for a major part of the CD-ROM’s relatively 
poor performance in locating the desired track. The time required to find the beginning 


of a particular track 1s referred to as seck time. 


SECTOR 20 


source: By [Earle ice 





Tigure 2.2 Sector Organization of a CLV CD-ROM Disc. 


On the positive side, CLV recording makes more efficient use of the disc surface. 
Rather than spreading out data on the outer tracks as on a CAV disk, the CLV format 
packs the data on the outer tracks just as tightly as on the inner tracks. As a 
consequence, a CLV disc can hold much more information than a comparably sized 
CAV disk. From the standpoint of audio recording, where the primary mode of access 
is sequential, the CLV format ts ideal. It packs the maximum amount of music on a 
disc Without exacting a performance penalty. However, when you build a data format 
on top of this audio format, you pay for increased capacity with decreased seck 
performance. 4 Zocilicks 1936.50.) 


C. PHYSICAL ADDRESSING 

The CD-ROM’s CLV format rules out using the famihar track and _ sector 
addressing schemes used for most magnetic disks. Instead, the CD-ROM uses a 
scheme that can be traced directly to its audio background. Each disc 1s said to have 60 
“minutes” worth of data. Each minute is composed of 60 seconds and each second is 


made up of 75 sectors. A single sector can hold 2M bytesmor date | herctore, the entre 
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disc can hold 340,000K (60 x 60 x 75 x 2K) bytes. The origin of the dise 1s specified as 
Pa (Zero munutes, zero secoiids. Sector zero). 

Application developers need not worry about the phvsical addressing details on 
CD-ROMs, just as they do not concern themselves with such details on magnetic 
media. The operating system will convert the physical view into a logical view, allowing 
the disk to be regarded as a collection of named files rather than a collection of tracks 
and sectors. Laser-disc operating svstems provide the same type of support for CD- 
ROMS. 


D. PERFORMANCE MEASUREMENT 

Good CD-ROM software design miust reflect an awareness of the CD-ROM's 
weaknesses, in particular its poor seek performance. Table 2 compares a typical CD- 
ROM{ drive with two different tvpes of magnetic-disk drives. The comparisons include 
Capacity, seek performance, and data-streanung performance during a series of 
sequential reads of contiguous data. The sequential-read performance on the magnetic 
disk assumes an interleave factor of five, meaning that it takes five disk revolutions to 
read all the data in a given track. 

An average seek on a full CD-ROM takes five times as long as on a 10-megabyte 
hard disk. When compared to a high-performance magnetic disk, there 1s more than an 
order of magnitude of difference in the seek performance. When designing software for 
a magnetic disk, a major effort to avoid seeks should be made. Given the cost of sceks 
on a CD-ROM, even more stringent measures should be taken to avoid an average 
fees. (ZOellre Bill, 1986, p. 180) 

However. Table 2 demonstrates that the cost of a short seek covering only a few 
tracks 1s relatively small. This is because the CD-ROM only needs to move the nurror 
Mseqe tO position the laser bean) on tlhe disc. It does not have to move the sled 
containing the mirror, lenses, and other parts of the disc-reading mechanism. {nstead., 
the laser bounces a pinpoint of Irght off the CD-ROM's surface, which consists of a 
pattern of submicroscopic pits. This mformation ts converted into a digital signal and 
meag DY an optical dise drive. 

This disparity between the cost of a short, local seck and a longer one is of 
significant importance. It means that every opportunity should be taken to minimize 
the physical distance between parts of a file to be used in succession. Since the CD- 
ROMI's sequential-read performance as shown in Table 2 is verv respectable, reading a 
large block of data does not cost that much more than reading a short one. The 


primary cost is in locating or finding the block. 
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WRSLE®? 
SEEK TIMES O@e Onis Vso vavGNETIC DISKS 





CO-ROM Average microcomputer hard disk 

Capacity 540 megabytes 10 megabytes 
Number of tracks 

per read head aporoximately 18,000 612 
Track-to-track seek 1 ms Sms 
Average seek 500 ms 100 ms 
Maximum seek 1 sec 200 ms 

~ Rolational speed approximately 300 rpm (variable) 3600 rpm 

Average latency 100 ms 83 ms 
Transfer rate 

for sequential read 15CK bytes/sec 95K bytes/sec 


source: BYTE, May 1986. 


Eo CDsROM BENG 

The CD-ROMs adequate scquential-read performance and its.abilitty to rapidly 
seck over the range of a few tracks are important to the design of good software. Its 
most beneficial characteristic is that it 1s a read-only medium. [t 1s nonerasable. lor 
appheations demanding secure storage of origmal versions of valuable documents, 
unages, or data streams, the prnnary advantave of nomerasibility is cvidcit: “Omeeamne 
data are recorded, nobody can modify or erase them short of physically destroving the 
micdia. {VIGOl ey 1) Semmrcmmn 

Two other benefits arise from the fact that a CD-ROM has a rend-oniv Waiting 
First of all, there are never any concerns with msertions, deletions, or niodifications. 
Therefore, when buildite a tree, the inost frequently used recerds can be placci jae 
nodes nearest the roots because they are never gomimeto clams, Secondiy, tlie Costa 
Writing and reading are not equally balanced. A CD-ROM 1s written only once but ts 
read over and over again. Therefore, more time and effort should be put into the initial 
construction of files and indexes in order to obtain the fastest retrieval possible. 
Furthermore, building the file and index structures is often done on a larger machine, 
while the retrieval is most likely to be done on a mucro. If expensive tasks such as 


lexical analysis and text formatting are mecessary, it is better to do them once with the 
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freer Computer before creating the disc. [Data for a CD-ROM are normally used 
interactively but are usually prepared in a batch-processing mode. This provides more 
mieentive to JO as much work as possible while still in the writing stage. See Table 3 for 
other CD-ROM advantages. 


Vy 
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TABLE 3 
ADVI NING LS OPC DAO! 


® PERMANENT/DURABLE: It is an excellent archival medium (currently Sony 
disks are guaranteed for 50 years.) Also very rugged and able to withstand 
adverse weather and handling conditions. 


@ NON-VOLITATILE: No loss or altering of data during power failure or surges. 
@ LOW COST: The ‘per MB’ cost of data is less than any storage medium. 


: EXTREMELY PORTABLE: The media is removable and offers portability of 
ata. 


@ SECURITY: Physical control can be maintained easily and thus large 
quantities of sensitive data can be controlled. Also, 1 possiblity exists to 
manufacture the disk out of glass instead of polycarbonate material and thus, 
for military purposes emergency destruction could be easily accomplished. 


@ SMALL PHYSICAL VOLUME/WEIGHT: Easily carried, or mailed etc, at a very 
reasonable expense. 


@ NOT ABLE TO BE ALTERED: This media is Read Only Memory (ROM) and as 
such, it is extremely useful for audit trails in the legal and financial world 
where magnetic media have not been allowed as evidence due to the 
alterability of that media. 


® ENORMOUS DATA STORAGE CAPABILITY: Up to 600 MB of dataonasingle 
side of a single disk which is only 4.72 inches in diameter. 


@ USER FAMILIARITY: It is simply another PC peripheral that, to the user, 
looks just like a read only MS-DOS etc. disk. Also, the average user has had 
experience with the same physical disk in the CD-Audio environment and 
therefore feels more comfortable with it all ready. 


® BACKUP IS ELIMINATED: There is no need to backup the disk because it is 
ROM. For safety sake, mulitiple copies can be ordered at the time of disk 
pressing and stored in separate locations. 


@ ELECTRO-MAGNETIC PULSE (EMP) HAS NO EFFECT: This is not a magnetic 
media and therefore any sort of electro-magnetic energy has no effect on it. 


@ NO HEAD-CRASHES: The read-device is optical and does not contact the 
disk in any way, therefore, head-crashes are virtually eliminated. 


| vource: Lind dhe. oo eee 


MI. CD-ROM APPLICATIONS 


A. GENERAL REMARKS 

The basic technology for read-onlv optical discs was developed to distribute 
movies and high-fidelity music. Consumer electronics companies spent hundreds of 
millions of dollars over the past decade in Europe, Japan, and the United States to 
make the videodise and audiodisc inexpensive, reliable, and long lasting. As a result. 
data distribution on CD-ROMs was a natural and direct extension of the basic 
technology. (Hensel, 1986, p. 487) 

Information users who have access to a microcomputer and optical disc plaver 
meenow able to access entire collections of databases that have been placed on CD- 
for ihe resulting savings are significant. Even if there ts no other reason for buving 
the microcomputer and disc player, they pay for themselves with a few hours of 
Pemvicy per week when the alternative ts online connect charges. Tlowever, much 
ereater savings are possible. The Internal Revenue Service has begun a project entitled 
“bie Archival Image Storage and Retrieval” which it estimates will save as much as 


$36 million annually in storage costs. (Contract, 1986, p. 18) 


B. LIBRARY APPLICATIONS 

CD-ROM library applications are essentially of two types. On the one hand they 
are designed as support tools for library automation activities, including traditional 
book cataloging and local public access catalogs. On the other hand, thev provide 
inexpensive around-the-clock availabilitv of databases previously produced in paper 
format. (Vielin, 1987, p. 509) 

A eritical problem often faced by librarians 1s the growth of their collections. 
especially the periodical and resource indexes. Increasing volumes of new data, in both 
Pit amd microlorm, have meant that mereased space ts needed to house them. The 
ability of CD-ROM to store hundreds of thousands of pages in a limited space is very 
Meecaine tor tins very reason. Tlie medium is practically indestructible. Not only can 
dozens of books be stored on disc. but rare and fragile documents, never before made 
available to the public, can also be stored in their original form: without concern that 


they will be damaged or destroyed by patrons. 


Groher Encyclopedia has alreadv produced a version of the .fcademic -merican 
Encyclopedia on optical disc. Also, the Library of Congress is currently conducting a 
special optical disc pilot program that includes rapid high-resolution scanning, storage 
and retrieval of images of journal titles, law materials. manuscripts, sheet music, maps, 
and technical reports. The British Library is experimenting with the development of 
bibliographic files on CD-ROM. 

Moreover, Software Mart, Inc. (SMI) has developed an illustrative dictionary 
with voice annotation on CD-ROM. It is called The Visual Dictionary and could propel 


illustrated consumer dictionaries into foreign language training vehicles. (Kuhn, 1987, 
p. 3) 


C. MEDICAL AND LEGAL APPLICATIONS 

I. can be argued that where knowledge is concise, it should be delivered in a 
concise way. This is particularly applicable to clinical, action-oriented knowledge. 
(Tluntting, 1986, p. 529) Micromedex, Inc. has applied this approach with considerable 
success and has produced the first medical information product to actually achieve 
commercial successful distribution with their “Computerized Clinical Information 
System” (CCIS). The application utilizes highly structured menus that combine easily 
understood screen displays to bring clinical management protocols into the emergencv 
room with remarkable speed and precision. This design is successful because it 
recognizes that the emergency room physician or poison center technician is not 
working in a contemplative environment when he or she has need for the product. On 
the contrary, there are 2 multitude of distractions, perhaps even a life hanging in the 
balance. Consequently, the information must be delivered concisely and accurately 
with no time for discussion or debate. (Huntting, 1986, p. 531) 

The world-wide use of CD-ROM in the medical and health fields continues to 
grow. The Canadian Center of Occupational Health and Safety has incorporated the 
largest publiclv available chemical database onto a CD-ROM and has ineluded it in its 
efforts to improve data distribution and emplovee safety programs. (Abevtunga, 1987, 
p. 1) 

Attorneys and tax accountants must review a tremendous amount of reference 
material that may be relevant to their clients legal or tax needs. Equipped with 
entire electronic library at their fingertips, attorneys and tax accountants are suremee 
find it easier to track down and review material and thus improve their abilitv to serve 
their clients. CD-ROM is an ideal medium for many legal applications dealing with 


taxes, statutes, case histories, legal forms, and patents. 
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De CARTOGRAPHY APPLICATIONS 

Ones 1-i.0 Vi can store a coniplete digital nap of every street in New England 
plus additional information equivalent to 300 unabridged copies of Moby Dick. The 
basic map information. judiciously compressed, amounts to 120 to 150 bytes per street. 
piece G0 percent of the U.S. population lives on about one million streets represented 
in the Census Bureau's files. a simple extrapolation allowing for rural streets that 
Wiggle more than their urban counterparts, yrelds a nationwide digital map that will fit 
on a single CD-ROM. (Cooke, 1986, p. 560) 

[t would be more appropriate to publish regional or state discs supplemented 
with a wealth of information targeted for specific markets. The business edition, for 
example, would contain a list of all companies tn the region indexed by both industrial 
classification and geographic location. The family edition would have data about 
restaurants, tourist attractions, shopping centers, stores, and museums. 

DeLorme Mapping Systems of Freeport, Maine, has stored DeLorme’s lVorld 
aAtlas on CD-ROM. Also, the Compaq Deskpro 386 displavs maps of the entire earth 
from one laser disc in conjunction with a personal computer (Vizachero, 1986, p. 58). 

LaserPlot, Inc. has produced the first CD-ROM-based position tracking system 
for marine navigation. It displays full-color, digitized Natronal Oceanic and 


Atmospheric Adnunistration (NOAA) charts tn varrous scales (Belanger, 1987, p. 13). 


E. U.S. NAVY APPLICATIONS 

Current investigation into the interests of CD-ROM technology in the U.S. Navy 
revealed a NAVSEA sponsored project entitled “Computer-Aided Technical 
[Information Svstem” (CATIS). CATIS ts primarily tmnvolved with the placing of 
engineering technical manuals for the Trident-Class submarines onto CD-ROMI discs. 

Further tnvestigation discovered an ongoing project at the Naval Ship Weapons 
Sere Enemecring Station (\SWSES) in Port [lueneme, California. The project has 
been tabbed “ Engineering Data Management Information and Control System’ 
(EDMICS) and is tnvolved with placing engineering diagrams onto CD-ROM<s for use 
by major mndustrial facilitres. (Lind, 1987, p. 60) 

Image Conversion Technologies has been awarded a $2.5 million contract for 
image management services for the “Naval Print on Demand” system. ICT will digitize 
about 1.8 million pages of military specifications to be stored on two §80-gigabvte 
optical disc library units. IC T’s management system will be used for storage, indexing, 


and retrieval of all documents to be printed, while its order-entrv svstem will be used to 


manage orders and perform administrative operations. The anticipated printing volume 
is 225,000 pages per dav with a required turn-around time of two days. (Lind, 1950) p. 
O1) 

The Navv 1s also conducting research on CD-ROM technology at the Naval 
Postgraduate School in Monterey, California. [he thrust of this research 1s coneemaed 
with the adaptability of systems such as the [LOCI prototype addressed igi 


introduction of this paper. 
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IV. THE TYPICAL CD-ROM DATABASE 


A. DATA FILES 
1. Data Records 
The purpose of any database is to provide access to its data records. The data 
meeords 1) a CID-ROM database can be of either fixed length or variable tensth. The 
maximum size of a CD-ROM record is 2,147.4835.647 bytes, but there must be a 
memory buffer large enough for the largest record to be read. 
2. Data Records and Keys 
em ete inedeicieth DMte SIMMesmwilici are Ofreanized into indexes to provide 
Peeess tO the data records. Kevs do not have to be phvsicall¥ contained in the data 
Mecerds and the structure of the records need onlv be known to the application 
Provtam. Fiowever, if the kevs are contained in the records at fixed offsets from their 
beginning then this information can be stored in the index headers, thus allowing them 
to be accessed by application programs. 
3. Data Records and Indexes 
Data record keys are arranged into indexes. Indexing makes it seem that the 
feecords Of a data file are arranged in the order of the keys for that particular index. 
Because multiple indexes can be supported, there may be as many orders to the records 
Bieminere are indexes. 
4. Physical and Logical Data Files 
Files of data records are provided by the information publisher. For example, 
fies asaval Supply Center in Oakland provided Reference Technology with the data 
Rewards required for the [LOCD project. The TLOCD application can handle up to 32 
mies, which 1s the limit imposed by the Reference Technology file management system. 
These files can be placed on either optical or magnetic devices or both. All the physical 
feseare fOcically concatenated to form a single logical data file. and the offsets im the 
idexes reier to offsets fron: the beginnine of this logical file. A hmited update 
capability can be supported with multiple data files by logically appending new data 
files to existing data files and creating new indexes for the resulting logical data file. 
Givcx, 1986, p. 17) 


Be KEY RECO Riera 
Peivevc 
Keys are used to gencrate key records. Keys may be ASCII character strings, 
unsigned byte strings with most significant byte first (e.g. left-justificd ASCII or 
EBCIDIC text), signed integers with least stenificant byte first (cg, 1BimMl-?C anc e 
integers), Or unsigned integers with least significant byte first (IBM-PC and VAAN 
unsigned tntegcrs). 
2. Key Records 
Key records are the units from which indexes are formed. Thev contain a key 
field, to which other information, including the record’s location in the data file, ts 
affixed. Figure 4.1 summarizes the logical structure of kev records compiled by 


Keterence Technolosy. 


Kev Records 
fixed iength ang posiuon 
up to 32.767 
bytes in deneth 
The sum ot the contents of all key records and data 
Is limited only py the maxmnum tile size of 2 Grvtes-! 


Qittsetotadata | Data Kecord Length - Record number ce bextra 
PCr st (opuuonal) (opuonal) Data 
| 4 ovtes. stened Housed. 20rd must be used (fopturonal) 
MMe CUR, Sit elt bytes. signed lor hash table 


mtever, SB Ist 





Source: Key Record Vianagemaeees 


Figure 4.1 Kev Record Porical structure. 


The data record length is optional because it can be calculated from the offset 
of the next data record. The key record number ts needed only for hash-table indexes, 
because a record number can be calculated directly from the position of a record in a 
balanced tree. Duplicate key records are allowed. They are sorted sccondarily by data 


offset in ascending order. 


24 


Se creating nex necord Files 
Pee cord ies cam be created by the CD-ROM nignufactirer or by the data 
publisher. The decision should be based on the structure of the data records. If the kev 
is in a fixed location in a data record. the key records can be generated automatically 
Seetie disc manufacturer. Otherwise, the key reeords must be provided by the 


publisher in the format as described in Figure 4.1. 


Cee uNDEN FILES 
1. Indexes 

Indexes are created by putting sorted kev records into an index. Each key 
index provides access to the data records in the order of the kev records that compose 
mt. Kev records for an index may be arranged in either ascending or descending order. 
Each index is assigned an integer identifier, beginning with zero, which: is always the 
data index. Subsequent key indexes are assigned integers beginniug with one. 

ies recorismimerie data imdex contain only the brte offsets of the data 
mecords in the logical data file. Since the data index is keved by the record offsets, it 
feeedes sequential access to the records in the order they were received by the 
manufacturer. The data index for databases with records of fixed length 1s normally a 
Virtual index. For databases with records of variable length, a balanced-tree index 
Contaming the record offsets is created. This makes it possible to find a record eitlrer by 
sequential position in the sequence of data records. or by byte offset in the logical data 
file. 

The maximum number of indexes to a Reference Technology database is 
2.147,483,647. However, the number of indexes which can be accessed at one time is 
limited by available memory allocation. Each open index in the database requires 
memory for an Index Control Block (89 bytes, plus 12 bytes for each level of index) 
and for a key reeord buffer. Assuming two-level indexes and 32-byte kev records, an 
IBM PC with 384 Kbytes of available memory could support 2711 open indexes. (Key. 
ie, p. 19) 

2. Hash fable Indexes 

Well-designed hash tables support exact-mateh key searches with at niost one 
disc access. Positioning by Kev order will require at most two disc accesses. Partial- 
match searches are supported, but will require approxinately twiee as imuny seeks as 
the logarithm base two of the number of index pages in the hash table. (Key, 1986, p. 
19) 


ty 
Cay 


The kev records for a hash table are extendedstosimevde alc ordch Tecqu. 
number. A cross-reference table is appended to the hash table to allow positiommaees 
kev order with the overhead of a single additional disc access, and thereby allowing a 
binary search of the hash table tor partial matches. 

5. Balanced Tree Indexes 

A balanced tree for cach index is produced by placing key records in fixed= 
length index pages, which are arranged in a tree so that examining the records in a 
page of the tree at one level tells which page to examine at the next lower level. Since 
there is only one page at the top level, only one pace on cach level mesds stoma: 


examined to locate a specified key, 


D. CONFIGURATION FILES 

A configuration file contains the file specifications (the complete volume. path. 
and name) of each of the data files and index files that make up a database. Its 
function is to map the logical correspondences between index identifiers and the 
physical indexes. Performance considerations may request certain index files to be 
copied to a magnetic device. For this reason, a configuration file containsmen 
printable ASCII characters. This allows the use of a text editor to modify thewvalummes 


Or paths in a magnetic copv of a configuration file. (Kev, 1986, p. 24) 


V. KEY RECORD UTILIZATION 


A. KEY RECORD MANAGER 

Key Record Manager 1s a software access program for {iles with structured fields 
E@eerecords., [t was designed by Reference Technology primarily as a tool to be used in 
conjunction with CD-ROM databases. [t provides an Indexed Sequential Access 
Method (ISAM) comparable to mainframe retricval systems for record-oriented 
databases. The Aey Record \fanager allows for two index structures, a balanced tree 
and a hash table. The Key Record Manager software is implemented as a library of C 
language functions that can be linked to application programs Which require access to 


supporied databases. 


B. SAMPLE DATABASE 

CD-ROM databases normally consist of large files. each organized into similarly 
structured data records which are divided into fields. The data record fields consist of 
Key fields which are indexed and data fields which are not. The easiest way to 
conceptualize such a database is in two dimensions. A data record, the individual entry 
for a database, is the row; the field is part of a column of similar information for each 
of the rows. 

Figure 5.1 1s an example ofa simplified, fictitious stock market database. [t was 
reproduced from Reference Technology's Key Record Manager and will be referred to 
throughout the remainder of this chapter. The data records in this example are of 
variable length and are arranged in the alphabetical order of their ticker tape symbols. 

itive olisew hela refers to the oliset ofthe record from the begining of the data 
file. [t is not usually represented within the record but ts impheit in the ordering of the 
fecords within the file. The comment field is text which is not shown completcly 


because it varies in length for each company. 


C. USING KEYS TO BUILD KEY RECORDS 

vere must be a sonted [ile.ol Key records in order to construct indexes. It should 
Pemeliced in a hash table or tree for quick access. The kev fields of the records are 
Mecaato! creave key records which contain a copy of the key field and the offset of the 
record associated with that particular key field in the data file. Figure 3.2 shows a key 


record generated from the Dividend field in one of the data records. 


Pail 


Otfser Svinbol Name Exc. SIC Price Earnings Div. Date Commen! 

O ABC Ae suSGore se) alle ee 2 3.50 1.60 3/31/86 Corp. farming ts 
[2 2 APNE Tobacco A OQOl4 15 eae) Chewing tobacco 
RSF I) One! Taxico sos) 42 6.57 .45 WIS/86 This taxi compan 
29941 CAR  Meuat.Inc Gi2s 8 A572 SAN/86))- Meat producls for 
40443 DDE Dicers p70 17 Leal -. This fast-growin 
S6678 DRI Reulest ; 6344 (3.44) Poor invesiIments 
73419 DST DenSiand S057 12) Designer jeans. | 
$8007 EBR E Banks OTTO ae 5.22 1.60 3/1/86 Regional banking 
Wi) ese Clocks 5470 2.11 .72 4/1/86 Despite its name 
110849 FIN Finbank 6776 1860 1.00 = 1/15/86) Suspending the di 


Source: Key Iccerda tae ce mae at6: 





Iigure 5.1 Sample Stock Market Database. 


D. USING KEY REGO IR DS Sie ae Reims ies as 

The mdexes can be constructed once all the Kev records have betn crcaiegiem 
the database kevs. A complete database would contam mdexes for all the data record 
kevs. The mdexes are in turn placed in mdex files and are used to access thre data 
records themselves. The indexes could all be placed in one file or they could be placed 
in separate files, Pieure 3,3 contanys all the indexes ecnerated for the hev liclisaieie 
sample diitabase. Note that some of the ftelds such as Exchanec, Date, and Conia 


are not kev fichis and therclone Gammow pe scanned: 


Lae SPAT CHIN Gali Dieses 

Indexes are a space-saving device because tliey arc made up of Key recondsan aay 
than whole data records. Only one set of data records need be mastered onto a CD- 
ROM dise, with access to the single copy of the data records being made available nia 
different order depending on which index 1s uulized. Tins requires much less space than 
putting the data records on the dise in different places for different sort sequences. 

The data records on a CD-RONE tavesthe sequence sliown by then olen 


will alwavs retain that order mv the dala lem iow ec rathcanide.es To ic (tit eeeeen ds 


AO 
4. \ 





Data Record: 
Otfset Symbol Name Exe. SIC Price Earnings Div. Date Comment 





88007 EBR EBanks QO 6776 34 5 


tJ 
ty 


1.60 9 3/1/86) - Regional banking 


Key Record: 
Offse) Key 


88007 | .60 


source: AcyY Kecord Managcr, p. 7. 





Figure 5.2 Key Record Generation. 


have the order of their keys which have previously been sorted. Therefore, creating 
indexes for the key ficlds makes it scein as if the data records are arranged in a series of 
different orders, one for cach index used to access them. In our example, the data index 
(Index 0) is used to access the records in their orginal order. Figure 5.4 shows the 
order of the records when mdexed by Name (index 2) and when mdexed by Price 
Migvicx 4). 

Conceptually, the search for a matching key is accomplished by beginning at one 
end of the key sequence and searching the keys sequentially towards the other end unl 
a match or close match is found. For ascending searches, the first key equal to or 
ereater than the desired key will be retrieved. For descending searches, the first key 
cqual to or less than the desired key will be retrieved. Thus onc could search the Name 
index for “Tob” and retrieve “Tobacco” if the search is ascending, or retrieve “Faxico™ if 
the scarch is descending. In reality it is not a sequential scarch but is actually a 
balanced tree traversal or hash table look-up. Care should always be taken to design 


these structures so that the number of comparisons and accesses can be numinuved. 
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Index | 
Symbol 


(Data index) | Ofttset 





Q () 
WOS22 (US. 
Ler? 2 Gae 
2994 | 2994 | 
40443 40443 
56678 56678 
reek T3419 
88007 88007 
POs 7 | 10157] 
110849 110849 
Index 4 
Ottset Price 
5$667% | 
299-4 | 8 
1 10849 13 
(322 IS 
SOLS 17 
() Rae. 
lOVS7I pape 
73419 34 
&SOO7 34 
PG? 42 
source: 


Figure 5.3 Key Created Indexes. 


ABC 
BAC 
CAB 
CAR 
DDE 
DRI 
DST 
EBR 
ESi 
FIN 


Index 2 


Oftset 


0 
1O1S71 
T3419 
40443 
88007 


110849 
2994 ] 
56678 
E1077 


Numie 


AgBusCo 


Clocks 


DenStand 


Dicers 
EBanks 


Finbank 
Meat, Inc 


Realest 
Taxcico 


Ns £2 


Tobacco 





Index 
Earnings 


Offset 


5667% 
i522 
2994 | 
40443 
73419 


110849 
1O1S7] 


Q 
88007 
Uo 


Key Record Manager, Pp. 
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Index 3 
Offset Industry Code 
0 Olek 
2994 | Q128 
10322 Ol44 
Z\677 4577 
73419 5057 
lOtS7) 5470 
40443 $770 
56678 6344 
88007 6776 
110849 6776 
Index 6 
Oftset Dividend 
see 
404.43 
56678 
73419 
21007 45 
2994 | whe 
1O157] Wie 
110849 1.00 
() 1.60 
8XO07 1.60 


The records in the example. when accessed 


ov Name (Index 2) would appear to be ordered as follows: 


Ottser Svmbol Nunie Ext Si Gaealesice  burning<saibin Date 
Q ABC ACB UNG © eure) bce 3.50 1.60  3/31/%6 
Hols) EST Clocks Ne We eee pes. “Sal kG 
FA419 DST Denstint “O° 250.7 sa4 Pel 
40443 DDE Dicers Ce Say ae i 24 
KSO07 EBR E Banks Os .6776534 So eG) me et) 
110849 FIN Finbank O26) 7th os 1.86 1.00 1/15/86 
29941 CAR Meat ingy-@ “Olle 2s 1 ede 5/1/86 
S6678 DR] Reulest NS 6344 | (3.44) 
foi? CAL Taaico a ee God AS 1/1S/K&6 
eee BAC JPodacco Hi A Sls (7s 


Commnient 


Corp. tarming ts 
Despite its nume 
Designer jeans. 1 
This fust-growin 
Reyional banking 
Suspending the di) 
Meat products tor 
Poor investments 
This aad compan 


Chewine tonucey 





If accessed by Price (Index 4). the apparent order of the data records would be: 


Ottset Svmbol Name Bx Garr rice se dsnings Diy. Date 


Cominent 





5667% DRI Realest NN 6344 | (3.444) 
29941 CAR Micatlins Orel 2 Sas - 45 72 5iil/86 
110849 FIN kinbank OF 6770. 13 1.86 1.00 1/15/86 
LOs22 BAC Monae: PO) ce eval 
40443 DDE Diceis O- S770 17 eal 
OQ ABC AM BUsGe” ON (Ol T2 ees B50 LOO 3/86 
iQ VR SNe & Clocks Be SO 2 he amore 4/1/%6 
73419 DST Bonstiic ee s0 50 wee a 
S007 EBR  EBanks © 6776 38 5.22 1.60 V6 
Plog CAB Vaxviee Ss 1D ee O57 45 115/86 


Poor investinents 
Meat products tor 
Suspending the di 
Chewing tobacco 
This” tast-growin 
Corp. tarming 1s 
Despite its naine 

Designer jeans, 1 

Rewional bunking 
This tau) compan 





Source: Key Recerd Manager, pp. 9-10. 


Figure 5.4 Searching On Specified Indexes. 


FF. KEY RECORDS FOR SVEGI VE Be Goss 
1. Partial Keying of Data Records 
Index performance is generally better when smaller key records are involved. 
This 1s especially true for balanced trees where key records may result in additional tree 
levels and therefore cause additional disc accesses. Index size can be greatly reduced in 
some cases if some data records are not keved on every index. Since the Symbol index 
in Our example database is in the same order as the data records it becomes possible to 
kev only the first record in each CD-ROM sector. Then a partial match search in the 
much smaller resulting index could be followed with an exact match search in the data 
records themsclves. Index size can also be reduced by not indexing records on Key 
fields that are blank. 
2. Key Records With Extra Information 
Key records may contain additional information besides the key and offset 
fields. Figure 5.5 displays such a record. A length field may be included for variable- 
length records. [fowever, it is not essential because the length of the data record could 
be determined by finding the offset of the next data record and subtracting, but this 


would require an extra access to the data index (Index 0). 


Data Record: 
Otlset Symbol Name EXCe ste er rice Eanmings) IN. Date Comment 


73419 DST DenStund © S057 34 1.21 Designer jeuns, 1 


Key Record: 
Oftset Recordiength Key Number — Key 
(opuonal) (hashing only) 


T3449 I45&& 6 DST 


Source: Key Record ilanasemno a2 


Figure 5.5 Keys with Additional Data. 


i hash tables are used, a key number 1s reqinred because the record entries in 
Mash table arc not arranged bv the order of their kevs. Hlash table keys are 
distributed randoniy across index pages and are only sorted within a page. The kevs in 
a balanced-tree are arranged in a fully sorted pattern and therefore do not need a key 
number. 

One option which can affect application performance and dise overhead is that 
Key records can also contain extra or optional data for use only by the application 
program. Once a kev record is located within an index, the optional data can be read 
immediately from the key record and thus save an access to the data file. Appending 
extra data to kevs makes retrieval of that data very quick, once the kev is located. This 
is obtained at the expense of a larger index which would require a longer seek. 
However, a second seek to locate the additional data 1s no longer necessarv. 

3. Overlapping Keys 

Another area in which key record design can affect application performance 1s 
the overlapping of kev fields by other kev fields. For example, it might be desirable to 
allow a date field (Year-Month-Dav) to be searchable by various overlapping keys as 
Seeman Figure 5.6. This overlapped set of keys could be used to search on Year- 
Month-Day (Kev 1), Month-Day (Kev 2), and Day (Kev 3) information. By searching 
for partial matches Key | could also be used to search on Year-Month or Year, and 
Key 2 could be used to search on Month. The same searches could be performed with 
separate Year, Month, and Dav fields, but this would mean searching in three separate 
indexes for a Year-Month-Davy specification, with much worse than triple the access 


eimge fOr this index. (Key, 1986, p. 13} 
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Vi. CD-ROM INDEXING STRATEGIES 


Pe oALANCED-TREE INDEXES 
1. Tree Construction 

Hives Ccvcralmornmnsclsamirce strcrmmre On aeei- OVI is similar to that of a 
broad, shallow balanced-tree. Since CD ROMs are not concerned with insertions and 
seretvons the blocks of the tree can be packed completely full. This results in the tree 
using less space and in each block having a larger number of children. Moreover, a 
broader, shallower tree is produced. 

If balanced-trees are built bv inserting records randomly and if procedures 
developed for handling the growth of dynamic trees are used, the blocks of the tree will 
Besoetween 50 and |00 percent full with an average utilization of between 67 and 85 
eneent (Zoellick, 1986, p. 184). That is, trees will contain blocks that are not 
completely full. A special tree-loading procedure that does not use the normal block- 
splitting method involved in balanced-tree insertion 1s needed. 

The first step in developing an appropriate tree-loading procedure ts to sort all 
Biemnceords by their kévs as discussed in Chapter Five. The sorted records are then 
Patten One at a time into the leftmost block at the lowest level of the tree. When that 
Pecs is full it is written out to disc. The next record goes into a parent block. Then the 
next block at leaf level is filled. When this second Icaf block 1s full, it 1s written out to 
disc and another single record 1s placed in the parent block. This process continues 
Unul all the records have been loaded. Figure 6.1 shows that all the records are 
arranged in the blocks in a numbered sequence. 

The primary advantage of this loading procedure 1s that it capitalizes on the 
read-onlv nature of the CD-ROM bv building a shallow tree and avoiding secks. There 
is also an important second advantage. If each block is written out as soon as it 1s full, 
Siemeparent blocks wall be stored m close proximity to their children, making use of the 
m@O-IONl s better performance on short. local sceks. [urthermore, the proximity of 
parents and children will never be threatened since the balanced-trees used for CD- 
RON are not dvnanuc. 

There are other possibilities for decreasing seeks if something 1s Known about 


the distribution of requests for the records stored in the tree. Say, for example, that it 
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Figure 6.1 Properly Loaded Balanced-Trees. 


is known that 85 percent of the requests are for 10 percent of the records. The number 
of seeks can be greatly reduced if the tree-loading procedure can be designed to place 
the most frequently used records as near the root as possible. 
2. Tree Optimization Formulas 
The following formulas were used by Reference Technology in designing the 
TEOCDEmaibace: 
o L>=loe(N + 1) los(P 71) Seeisties ominee ievcls 


eR) = = Ne el P is # of key records in an index page 
eo NN oS (Pea N 1S) OU Rey fecomic ite widex 


These formulas relate number of key records, number of tree levels, and page size and 
are used to optimize balanced-tree performance for CD-ROM databases. Tabie 4 


displays examples of how the formulas can be used. 


TABLE 4 
OPTIMIZING BALANCED-TREE PERFORMANCE 


Given the number of records, page size. and key record size. the minimum numoper ot 
free [evels Can he cujculated: 


Number of key records = 100,000,000 
Page size = 4096 bvtes 
Kev record size = 8 bytes 


N + | = 100,000,001 
P+} = (4096/8) + 1 = S13 
Peealow(N + Dy doe (Pr al) —= 2°95 


Since there must be an integral number of levels. 3 levels are required 


Given the number ot tree leveis, number of records. and record size, the minunum pave 
size can be calculated: 


Number ot tree levels = 2 
Number of key records = 2,000,000 
Kev record size = 32 bytes 


N + 1 = 2.000.001 
a a ee 
Pe (IN Se ye) eta 


Since there must be an integral number of records on a page, the pave size must 
be large enough for I414 records. If the page size 1s divisible by 2048 bytes (the 
CD-ROM sector size) a 47, 104-bvte page size is needed. 


Given the number of levels, page size, and kev record size, the maximum number ot 

records can be determined: 

Number of tree levels = 2 

Pave size = 4096 bytes 

Key record size = & bvtes 
L =>? 
Pe le 00) 6) le le 
Cy A) l= .26 3,108 


Atmost 263.168 records cun be placed tn this tree. 


Source: Key Record Manager, pp. 21-22. 
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B.. /HASHED AIAN DEES 
1. Overflow Avoidance 
Hashing fits the strengths amd weaknesses of the CD-RONI pertectlwigs 
applications that do not need to access records in order by kev. It consists of using a 
function to transform each record’s key into a bucket address within the file. In order 
to find a particular record, the function is applied to that record’s key, and then 
retrieves the bucket at the resulting address. Hashing works well and permits single- 
seek retrievals as long as long as there is room for each record in its associated bucket. 
The following variables can be manipulated to guarantee that overflow does not 
happen: 
e packing density of the hashed storage 
»” the Size of thespweier 
© the design of the hash function 
Packing density and bucket size are discussed further in the next chapter. 
2. Hashing Functions 
Since CD-ROM is a read only medium, there exists a complete list of the keys 
to be hashed before the file is built. The Keys can be analyzed to discover functions 
that would distribute them more uniformly than a random function would. A perfectly 
uniform distribution would place an equal number of records in each bucket and 
guarantee no overflow even at a packing density of 100 percent. Although developing 
such a function can be very time-consuming. an economical way of improving on 
purely random distributions can often be found. 
The CD-ROM ’s read-only nature makes :t possible to optinuze a hash 
function. It is also practical because large computers operating in a batch mode can be 


used to create the data set that will be used interactively by small computers. 


C. JINVER TN) ies 

Inverted files are tdeally suited for full-text fields because when used with 
structured fields containing repeating key values thev save index space. A copv of each 
key value its stored in an index along with a pointer to a list of all records associated 
with the key. The Comments field in applicable databases 1s normally a full-text field 
and a good candidate for an inverted index. If each word ts used as a kev in a key 
record, the same words will occur over and ever avdin and create a verv larce ing 
An inverted file stores each word only once to represent all of its occurrences and 
results in a much smaller mdex. Figure 6.2 represents an inverted index for words 


beginning with ‘A’ and ‘B’ from a fictitious database. 
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Kev tndex 








Word Count Porter Record # 
Accelerate | | 
Accu ate | = 
Acreage | 3 
Act | 4 
Apprarsal { 5 
Arcana | y 
Attractne | ; 
Avond | S$ 
Back l a | 
Balance | 10 | 
Bank t I | 
Tes 
{3 
Base | |4 
Beet | [> 
Biv | lb 


site 





Blouse } 
Blue | 26 
Blunder | eal 
Bomb l a2 
Break | na 
Bright | 


eeurce: CDOUVOM Optical Publishing, p. 115. 


Figure 6.2 Inverted Indexing. 


Such soplusticated indexing schemes can sometimes require as much or more 
Space as the data itself. The Grolier Electronic Encyclopedia requires 60 megabytes to 
accommodate the text and 50 megabytes to accommodate the indexing. (Dixon, 1987, 
pp. 10-17) 


meee SS OSING LME PROVER INDEX STRUCTURE 

Because CD-ROM discs are a read-only medium, the choice of index structure 
must be made when the database ts designed. [t 1s possible to use more than one type 
of index on a single database so that it becomes feasible to choose whichever type 


offers the best performance for individual key fields. 


Balanced tree searches are best for applications where partial-match searches are 
frequent. This ts, because the index can be ordered by the kev value. [hey also periamm 
well in exXact-match applications when it is desirable to nuninuze the mdex size. 
Balanced-trees used in CD-ROM applications waste no space and can typically acquire 
any kev in the index with only two or three accesses. 

Hash tables perform best when the quick access of exact matches is the main 
consideration. Normally, hash tables can be constructed so that onlv one disc access 1s 
sufficient. However, hash-table indexes are not as compact as balanced-trees and will 
typically be 20 to 50 percent larger than a comparable balanced-tree index. 
Furthermore, hash tables perform partial-match searches poorly because it 1s nearly the 
same as searching’ a sequential irlen(@elvime 19575 pallies, 

Boolean and relational operations on CD-ROM discs are best supported by 
inverted files. Either hash tables or balanced trees can be used tocreate the [ileSseoiiee 
ail data record numbers containing a particular kev value are listed together in an 
miverted file, rt must be loaded into a rather large memory buffer to minimize accesses 
to the CD-ROM. 

The index structure used in the development of TLOCD was a combination of a 
balanced-tree and a hash table. In this way the time required to perform both partial 


and exact-match searches could be minimized. 
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Vil. CD-ROM FILE MANAGEMENT 


A. GENERAL DIRECTORY STRUCTURE 

ieie Tieh Stérra Standard entails a hicrarchical structure of descending 
subdirectories branching down from the parent directory. This directory structure is 
ealed)a Standard File Structure.” and there must be only one per CD-ROM disc. A 
path table operates as an index tu each subdirectory and provides a pointer to the 
logical block number where the subdirectory is located. A path table obviates the need 
tomsort cach level ofethe dircctory hierarchy in the search through the directory 
Meeerire, Lnder certain circumstances. the path table can be containec in RAM. 
Ppemidins one-seck access to the subdirectory of imterest. This occurs when the 
subdirectory names are short enough and the number of subdirectories small enough so 
Maat the path table can reside 1m one physical;logical sector. (Approximately 128 
subdirectory names of eight characters each will cause the path table size to be about 
2043 bytes or one logical sector.) Thus, given an eight-level tree, holding a path table 
in RAM saves seven seeks. (Standard, 1986, p. 2.4) 


B. DIRECTORY STRUCTURE DESIGN 
1. Multiple-File Explicit Hierarchies 

This type of directory structure is used by UNIX, MS-DOS, VMS, and other 
magnetic disk systems. Early versions of Digital Equipment’s UNIFILE system are an 
example of a CD-ROM file system that used this kind of directory structure. This 
particular structure as shown in figure 7.1 allows subdirectories to be treated as files. It 
is an excellent system for magnetic disks because it provides the flexibility required in 
order to add new subdirectories and delete old ones. However CD-ROMs do not 
require such flexibilitv. Furthermore, we cannot afford the time to seek from 
subdirectory file to subdirectorv file in order to find a file with a long path name such 
as: 


Hohnson Jprograms ,source ‘acctg ‘ledger ;post.c 


The strong features of this type of directory structure are familiarity and the 
fact that it handles generic searches reasonably well. Moreover, by taking advantage of 
the CD-ROM's read-only nature, the files in each subdirectory can be sorted and 


improve generic searching even more. 
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eau isacvantawe ds Lijit we MiUSt search through at entire level of the 
Mrectory Sifucture while looking for a file. If all the files are tn the root then a search 
fora single file would involve the whole directory. Even if the files are sorted within 
each directory level, a binary search of a large single-level directory containing 10,000 
files would require a dozen or more seeks back and forth across the sectors that make 
up the directory. 

2. Single-File Explicit Hierarchies 

This approach to directory hicrarchics involves placing the entire directory 
Geecrure ii a single file. The root directory and ali subdirectories are treated as 
feeonds within a file rather than separate files. Figure 7.2 represents this type of 
Simmcture, which was used in the first version of LaserDos, a trec-oricnted system 
feeroned bY | VMS, Inc. for optical discs. The left pointers from the subdirectory records 
point to clements in the subdirectory. Right pointers always point to files or 


Smocirectorics at the same level. 
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The important benefit realized from compressing the directory hierarchy into a 
single file. rather than spreading it out bv using a different file for cach subdirectory, 1s 
that we can often cut down on the number of seeks required to open aeiaie 
somewhat small directory containing no more than two hundred files can be contained 
in just two or three sectors wnich could easily fit in RAM. This holds true even if there 
are many levels of subdirectories. Therefore, the single-file explicit hierarchy can often 
improve on the performance of multiple-file explicit hierarchies when opening files that 
have path names containing several subdirectorv levels. 

3. Hashed Directories 

Any file can be opened in one seek if we hash the entire path and file name to 
an address within the directorv. This will work even if there are tens of thousands of 
files on the disc. A hash function would transform the character string representing the 
path and file name into the address of a hash bucket. A seck to the directory bucket 
would gain access to the information needed to open a file. 

[f the hash buckets can be prevented from overflowing, then it can be 
guaranteed that the hashing procedure would require no more than a single seck. If 
overflow occurred, one or more seeks would be required in order to locate the 
information that had to be stored elsewhere. The read-only nature of the CD-ROM 
makes it possible to manipulate the packing density of the directory file. Overflow can 
be avoided by placing a small number of records into a large file. The more tightly a 
file is packed, the more likely it is that at least one bucket will overilow. Thesiuena: 
size also affects overflow. No overflow could be guaranteed if the entire file was 
considered to be a single bucket. Unfortunately, the entire file would have to be read 
into and processed in RAM. 

4. Indexed Directories 

The kev to the success of this approach ts a structure called a path table. The 
path table provides a compact mechanism for quick translation of the full path for a 
subdirectory into an integer called the path identifier. The path identifier is actually the 
relative position of each file obtained from a level order traversal of the directory 
hicrarchy. By examining Figure 7.3 the path identifiers for the following path names 
can be determined: 

e strlib = | 
e /mathlb = 2 
e text = 3 


eo - Strib obi =s5 
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o mathlib ,source = 7 

9 fei reports = 10 

9 /text {specs jinput = 14 
The path table's ability to compress an entire directory path into a two-byte mtcger 
guarantccs that directory records can be kept relatively short and that many directory 


records can be put into cach block of the directory structure. 





/ 
/ 


SRL MATHLIB TEXT 
| \ 


| 


{ 
} 5 
| | ) 4 
\ | \ / \ 
, 7 q v 4 , 4 | 
SOURCI SOURCE [exe] REPORTS Sues | 
joa 


/ 


perrce: CD-ROM The New Papyrus, p. 119. 





Cine (eo mmlndexen Aime ot oliCt rc 


After performing an average seck of about .5 seconds, a minimum of one two- 
byte sector is read in from the disc. For an additional cost of six milliseconds another 
6K bytes can be read in, making a total of 8K bytes im all. If the size of the directory 
records can be held to 32 bytes each, then each seck out to the CD ROM can bring in 
as many as 256 records for an 8k block. 

The file records are placed into the blocks of a file table which contain the 
information needed to open any file in the file system. They are arranged according to 
their path identifier which was extracted from the path table. Asa result, all the files in 
a single subdirectory are grouped together (i.¢., they have the same path identifier) and 


then ordered by name. This structure supports efficient generic and binary searching. 


When a particular file is to be opened, we need to find the block in the file 
table that has the record corresponding to the desired path identifter and tite name lime 
costly part of the file search is the seck to the block’s beginning, so it is desired to lind 
the right block on the first attempt. To ensure this occurs, an index table is used to tell 
the path and file names that are at the block boundaries) Figure 7.4 disilaeue 
overview of the contents in the file table. Now suppose the file to be opened is: 

/strlib /source /strchop.c 
It is shown in Figure 7.4 that the request starts at the path table and converts the path 
name into a path identifier of ‘4’. The index table is then searched for “4strchop.c’. 
Since the value of “4strchop.c” ts less than the first entry (alphabetically), it follows the 
first pointer from the index table to find the first block in the file table where it finds 


the location of the frle and other information required to open it. 
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The path table’s compression ability allows for short directory records so that 
many of them can be packed into each block of the tile table. [his reduces the wera 
nuniber of blocks required for the file table. A smiall file table will result in a small 


index table. It would be very desirable to store both the index and path tables m RAM 
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rather than forcing a dise seek everv time we needed them. In this wav the indexed 


directory allows the opening of anv file with only one seek to the CD-ROMI. 


C. BLOCKS AND BUFFERS 
{. Determining Block Size 

A general rule applving to anv file structure design ts to make each disk seek 
as profitable as possible. This is the reason why paged structures such as balanced-trees 
are commonly utilized. Each access to the dise retrieves enough data to make decisions 
about the next tree level instead of making a simple two-wav choice in a binary tree. 
The disc is never accessed to retrieve only one record but to retrieve a block of records 
that can be read and processed much faster in RAM. Even though CD-ROMI secks 
Slowly, it cam acquire a large block of data at an acceptable rate. Therefore. the choice 
of the block size is extremely important. 

Both physical and logical design factors should be considered when selecting a 
block size. Consider the effect of page size on the depth of the trees previously shown 
in Figure 6.1 A page that holds N records can have N+1 children. The first tree tn 
Figure 6.1 has a height of two levels and holds eight records. This height ts ideal 
because storing the tree’s root page in RAM ensures a one-scek retrieval of any record 
imine tree. Records can be added to the tree by adding more levels. However, this will 
increase the average number of seeks required for searching. A better plan calls for 
Mieneasiie the block size to accommodate more records. The second tree in Figure 6.1 
shows the result of doubling the block size. 

Since the CD-ROM 1s read-only, it 1s known exactly how many records are 
MommestO be put into the tree before it is built. For example, storing 50,000 32-bvte 
records and using a block size of 2K will result in a three-level tree. A two-level tree 
emmoe built if a block sizeof SK is used. It takes longer to read a larger block. but 
since CD-ROMs can read data at 150K bytes per second, reading an additional 6K 
bytes takes onlv 20 milliseconds. This is a smalt price to pay in return for avoiding an 
additional 500 mullisecond seek. Minimizing the number of secks ts the logical 
consideration for using large block sizes. However, the CD-ROM's physical features 
should also be considered in deternuning what block size to use. 

Simcestne sccton size [or a ©D-KRO Wl is 2K bytes, the smailest block size that 
should ever be considered is also 2K bytes. This is duc to the fact that even if only one 
byte is needed, 2K bvtes will be retrieved. An effective operating system: will transfer 


the data directly into an application program’s work area with no intermediate data 
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movement. So what happens if a program requests only 64 bytes. or some other sector 
fragment? In this case the operating system cannot assume thac the application 
program has allotted enough space to hold an entire 2048-bvyte sector. A svstem buffer 
must be used to hold the complete sector unti! the 64 bytes desired can be transferred 
to the application's work area. Data must be handled or moved twice when anything 
less than a complete sector is requested. Therefore, in order to avoid unnecessary data 


movement, a block size that is a multiple of the 2K-bvte sector size should be used. 


2. Buffer Usage 

Reading data in multiples of the sector size results in by-passing the system 
buffers. This blocks the operating svstem from keeping recently used data in RAM. 
For example, when a 256-byte record is read, the operating system uses one of its 
system buffers to hold the sector containing the record. Now another 256-byte record 
is read in from a different sector. This new sector is placed m a dilierent Dullleraape 
program now calls for a third record which happens to be in the sector which was 
placed in the first buffer. Therefore, no seek is required for the third record because its 
sector is already buffered in RAM. 

Now suppose instead of reading fragmented records, 2K bytes are read to 
avoid moving data twice. In this case, svstem buffers are not used because the data 
goes directly to the application work area. Consequently, a section would be read on 
top of the first one. In order to benefit from buffering in CD-ROM technology, the 
decision of how many buffers to provide and how to manage them depends on the 
nature of the application. If the application searches through tree-structured indexes or 
Works in both directions through a sequence, it can benefit from a large number of 
buffers. If the application moves sequentially through the data in one direction it will 
not benefit from buffering at all. 

Reference Technology utilizes a general purpose buffering scheme known as 
Least Recently Used (LRU) replacement. Information in the buffers is retained for user 
access until buffered data are replaced, according to the least recently accessed 
protocol. Best performance occurs when the page size is the same as the butfer size and 
when the number of buffers selected is sufficient to retain the most frequently accessed 
pages in memory. 

Because applications differ, it is impossible to ensure that the most frequently 
accessed pages will always remain in the buffers. A procedure is needed that will select 


the minimum number of buffers for maximum performance. Such a procedure would 
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require that there be at least one more buffer than the number of levels in the tree. 
muse. there should be at least two buffers for each hash table. The extra buffer per 
index will hold the data record, while the other holds the index pages. Thus, if two tree 
indexes with two levels each and two hash indexes were frequently accessed, all with 
4096-bvte index pages, then 23 + 2° = 12 buffers of 4096 bytes each would be the 


iminimum configuration for best performance. (Kev, 1986, p. 23) 


D. NIULTI-VOLUME DISCS 
1. Adding Additional Discs 

A CD-ROM disc is described, according to the Ifigh Sierra standard, as a 
volume (Standard, 1986, p. 2.5). The standard allows for multi-volume sets of discs, 
Which are of two basic kinds. The first is the tvpe of multi-voluime set designed to hold 
a single massive database that exceeds the capacity of a single disc. The path table and 
directory structure on each volume of this kind is required to be the same. In this way, 
pane location of any file in the set can be found bv reading the directory from any one 
of the discs. Clearly, it may become necessary to mount a different CD-ROM disc from 
the set in order to read that file. However, the presence of identical path tables and 
directories avoids the need to mount disc after disc to find the file of interes<. 

The second type of multi-volume set of CD-ROMs is necessitated by the need 
to update files or add new voluines to an existing volume set. If this is the case, the 
most recent volume’s path and directory information must supercede that of all 
previous volumes. Moreover, the the last volume in the set must be mounted when the 
system 1s booted in order to supply the svstem with the freshest information. By 
@everiiae relerences to a file, or including references to a file in the directory structure of 
the latest disc in the updating volume set, existing files can be “deleted,” “modified.” or 
“replaced.” They actually still exist on the earlier discs but since the latest directory no 
longer points to them, they are no longer available to the svstem. Although physically 
present for the life of the CD-ROM, they are logically lost or altered under the present 
configuration when the new volume is mounted. However, they can be restored if an 
earlier Volume in the set is mounted at svstem start-up. 

2. Extended Attribute Records 

CD-ROM file management that is supported within operating systems such as 
Pee Os sees Optical disc data as simply a stream of bytes. For other operating 
environinents, extended attribute records (NARs) can provide additional mformation 


about the file and its structure. An XAR is an optional attachment to the beginning of 
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a file, containing extra information about that file. Lxamples of such additional data 
include creation and expiration dates, access control, record structure, record 
attributes, and application-specific information. 

One particular use of XARs ts to control which version of a file is to be used 
when there is a rnulti-volume set of discs containing several versions of a file. This 
works because the N:AR affixed to the last extent of a given file supercedes the XARs 
affixed to all the other previous extents of that file. If there is no NAR with the last file 
extent, the NARs with preceding extents are ignored. Thus, by altering the XAR for 
the final extent of a file. the incidental information about a file is effectively updated 
when a new CD-ROM is issued. 

Another use of XARs is to restrict who may read certain files on a disc. The 
standard is sumilar to the VMS “svstem. owner, group. world” permission design. It 
should be noted that access restriction only works under those operating systems that 
recognize it. [f someone carries a disc with restricted files to a computer whose 
Operating system, hike MS-DOS, does not recognize access protection, the system will 
read the disc, regardless of the setting of the XAR. Consequently, designing access 
restriction into a disc must be coupled with a plan to restrict the phvsical distribution 
of the discs. (Standend. 19365 ome) 
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Vill. CD-ROM APPLICATION SOFTWARE CONSIDERATIONS 


pee FILE SYSTEM SUPPORT 
1. Origination Software 

Before making a CD-ROM, the files that will appear on the disc must be 
assembled according to the rules of the logical format. Origination software does this 
work, providing the writing component of the file system. 

At the present time, most origination software runs on muinicomputers in 
batch mode. Figure 8.! shows the relationship of the four principal components of 
fis Ss LaserDOS origination svstem. The user begins with a Specify program that 
provides an interactive sheil-lke mechanism for creating the directory hierarchy that 1s 
to be used on the CD-ROM. During this step the user can indicate which files are to 
go in which subdirectories. The specification is used as input to a Load process that 
reads user files from tape and magnetic disk to create a disc image, complete with a 
volume table of contents and directorv structure in the logical format that will be used 
on the CD-ROM. After loading, the user can run a Verify program that automatically 
checks the internal consistency and integrity of the disc image. The user can also run a 
Shell program that exercises the image of the CD-ROM file system interactively, 
allowing the user to dump out the contents of individual files, copy files to the host 
Operating system, and so on. 

2. Destination Software 

Destination software is the reading component of the file system. It 
understands the logical format and uses it to provide access to the CD-ROM files. One 
Wav to approach the design of destination software is to create a file manager prograin 
containing special function calls that are exclusively for use with the CD-ROM and 
which bear no relationship to the system calls provided by the host operating system 
(Zoellick, 1986, p. 125). The advantage of this approach 1s that the file manager and 
application programs that use it are not affected by changes in the operating system, 
thus allowing a higher degree of portability. The main disadvantage 1s that applications 
cannot access the CD-ROM through standard system calls which in turn prevents 
access via high-level language I/O facilities. This makes the CD-ROM less user friendly 


since familiar language tools and system utlities are unavailable. 
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Another design approach involves software such as TMS’s LaserDOS and 
Reference Technology’s Standard File Manager, which are implemented for use with 
pe -WOS (Zoellick, 1986, p. 126). The approach’s intent is to cooperate with the host 
system as much as possible. For example, LaserDOS traps all system calls and 
determines if the call is CD-ROM related. If it is CD-ROM related it will handle the 
call itself. If it 1s not, it simply passes the call on to MS-DOS for completion. The 
calling software is not smart enough to know the difference. Reference Technologvw’s 
Standard File Manager works similarly in the TLOCD system. The CD-ROM appears 


as just another disk drive to the TLOCD user. 


B. COMPILER LIMITATIONS 

Some compilers used in writing applications that address the file system can in 
Biemiselves linut the size of files. For example, VIS-PASCAL (1M)* (versions 3.1%. 
Pee iiuts the size of files to eight megabytes. CS6 (TI)* (version |[.2) has the same 
Pet Lattice C (1M) (version 2.1X) on the other hand is not limited in this wav. 
Reference Technology's Standard File Manager limits itself to file sizes of two Gbytes 
but the compiler must be capable of producing code that can access a file of this size. 
PC-DOS has the same two-Gbvte file size limitation as the Standard File Manager if 
files are accessed through the Standard File Manager “file handling” functions. 
tandard, 1986, p. 2.12) 

Another potential limitation from compilers 1s that some restrict the number of 
files that can be open at one time. For instance, Lattice C (TM) (version 2.1X) has a 
linut of 20. including the standard input, output, and error files, as well as any hard 
disk or diskette files. The Standard File Manager for CD-ROM systems allows up to 


200 files to be open simultaneously. 


C. PC-DOS ADAPTATION 

One of the more frustrating things about using CD-ROM with IBM PCs 1s the 
limitation placed on the size of a logical disc volume by the PC-DOS operating system. 
It is only 32 megabytes--a mere thunble full compared to the 540 megabvtes typically 
available on a single CD-ROM. Fortunately, there are several ways to sidestep this 
linutation. One relatively easv way is to surrender to PC-DOS and break the dise into 
32-megabvte partitions. 

Tlowever, the most powerful method to get around the size limitation involves a 


new interrupt handler. It may also be necessary for the file-management system as well 
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as the directory depending on how the particular system is set up. By trapping sie 
operating svstem interrupt, the interrupt handler can i:.tercept calls intended for the 
CD-ROM while other calls are simply passed through. Once intercepted, the CD-ROMI 
calls can be treated differently, still maintaining system transparency to the user. 

The difficulty arises when the interrupt handler must also support every disc call 
in exactly the same way as PC-DOS supports them. Those calls include functions that 
open files, read from files, check for remaining disc space, and so forth. Supporting all 


of those functions necessitates a tremendous amount of code generation. 


INS ANALMSIS AND DISCUSSION 


A. SHIPBOARD USE OF CD-ROM 
1. Departmental Applications 

Picicmoircm Inn sipplicilons tor CU-ROWessstems on board U.S. Navy 
vessels. Such applications will decrease the ship’s weight (by eluninating paper storage 
media) and make more space available. The advantages, and disadvantages and 
possible problem solutions are addressed. 

The Navigation department should store its hundreds of charts on CD-ROM 
Maeeeminiiate a majority of its bulky chart cabinets. The svstem would store the charts 
in ascending order according to chart number and would also provide a cross-reference 
Index for user assistance. The system would prompt the user to enter the nuniber of the 
Sire te wishes to sce and then display that chart on the monitor. However. there 
must be a svstem on board for reproducing these charts into a paper medium so that 
Pemections, courses, fixes, and coordinates can still be plotted. The technology necded 
to reproduce NOAA charts in vartous scales is now available froin LaserPlot. 
Bee belanger, 1987, p. 15). 

ite Operations departrnent should use CD-ROM to hold its clussilicd 
publications. Security will be better because there will be fewer classified materials to 
be monitored. Confidential material would be kept on one CD-ROM, Secret material 
on another, and Top Secret material on still another. However, in environments such 
as \IS-DOS. security becomes breeched when a person with the “need to know” about 
a certain topic has access to all other classified information that resides on the dise he 
happens to be reading. In that case, software would have to be developed in which the 
See sec Vio custodian would control a “read denial” lock for each classified file. The 
operating svstern would not relinquish control to the CD-ROM file manager without 
eeckmie the lock status. The lock could only be set or resect according to a program 
meecutcd by the CMS custodimn. No file could be opened and read without the 
custodian’s knowledge and approval. An individual would sign for the CD-ROM aud 
meee vis custodian would release the locks on those files that the user 1s qualilted to 
mere Upon the return of the classified disc the lock would be reset. Another 


particularly helpful CD-ROM. application in the Operations area myvolves “signal 
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breaking” or tactical communications. Such an application should be written to search 
through tactical publications such as NWPs and AWPs and break coded signals. 
thereby ensuring timeliness and accuracy in situations that can be and often are 
critical. The tactical officer would Kev in the coded signal phrase and the svstem would 
search its database for that particular sequence of words. The results would be 
displaved on monitors located on the shirp’s bridge and tn CIC. 

The Engineering department maintains a vast number of operating manuals. 
technical manuals, repair manuals, and schematics. The transfer of these from paper to 
CD-ROM would certainly reduce weight and increase available departmental space. 
The engineers would also have access to many more manuals, blueprints, and technical 
publications not normally carried on board. But how is a repairman going to get a 
repair manual to the scene of repair? Must he go to a CD-ROMI reader and print out 
the applicable pages? The answer is a qualified ves. A repairman will usually have to go 
to a centralized location to check out a manual. Dise readers and printers should be 
placed in these strategic locations in order to minimuze the inconventence. [n certain 
circumstances, with the use of some advanced technology, a print-out may not be 
necessary. 

The Supply department should use CD-ROM to store its wide variety of 
catalogs, parts lists, and various other publications. Cookbooks and recipes would no 
longer be lost or misplaced. Atl of these potential uses would be complemented by the 
CD-ROMIs ability to store visual images. The supply clerks can sec exactly whatitiies 
are ordering and thereby reduce errors that often result from making assumptions or 
guessing about item uncertainties. Moreover, CD-ROMs already contain the Navy 
Management Data List (NMDL) and Parts { and If of the Master Repair [tems 
Listing (MRIL) which is distributed by the Navy Publications and Printing Service. 
NAVSUP also sponsored the TLOCD project dome here al the Naval Posteradiae 
SchGov 

The Administration department would no longer have to print and distribute 
copies of Navv-wide regulations and instructions throughout the shrp. The drawback 
here is a lack of shipboard portabilitv. [or example. the person destring the 
information must be tn the immediate vicinity of a CD-ROMI disc reader. [le cannot go 
to lus stateroom, relax. and thumb through the newest instruction or regulation-- 
unless, of course, there happens to be a CD-ROM disc reader in his stateroom. This 


scenario is not unrealistic. Considering that the total cost of a disc reader. monitor, 


meyooard, aid printer cam be held under $1,500.00, it is feasible that such a system 
eemmiepe placed ii nearly all the spaces on board the ship. Costs could be reduced 
further if a networking system were implemented and public terminals made available 
to the crew. One possible networking scheme would involve a modem to modem 
machine interface using the ship’s telephone lines. Ilowever, this method might 
Interrupt routine shipboard communications by tving up the phone lines. A better 
solution would involve the development of a local area network (LAN) which would 
allow as many users as there were svstem hook-ups. Each compartment would be 
wired so that portable ternunals could be supported. The structure would be relatively 
simple for such a system and could be supported by a common network topology such 
as a ring. The decision to implement a LAN or to pursue a certain network topology 
across a particular class of ships should be made by NAVSEA based upon flect 
managerial requirements determined by individual slip needs. 
2. CD-ROMNE Impact on the Paperless Ship 

Een ollicer and petty ollicer aboard every Navy ship has at one time or 
another become frustrated by the unending flow of required paperwork and the 
plethora of information in technical manuals and documents that must be available, 


read, and studied. Cumulatively, their weight is in tons. VADM J. Metcalf [il states, 


“T find it mind-boggling. We do not shoot paper at the enemy. We do not train 
sailors to be registrars and correctors of publications. I want those guvs worrted 
about fighting. not worrying about keeping up the publications.” 


The admiral has launched an initiative to create a “paperless” ship by 1990 as a first 
step toward driving paper from the entire flect. The first ship would be a [rigate, he 
said, that will probably be equipped with different types of electronic information 
peoterms. ( Vietcalf, 1987, p. 35) 

CD-ROMI technology ts only a piece of the puzzle when it comes to putting 
together such a svstem. One must consider the feasibilitv of making CD-ROM disc 
readers accessible to all departmental and divisional offices as well as in CIC, DCC, the 
Bridge, engineering spaces, and staterooms. The initial cost would be considerable but 
would be offset in a short while by the reduction in mailing costs of optical discs as 
Speesed tO paper. See Figure 9.1 for a comparison between mailmg costs of CD-ROM 
and other storage media. 

Keyboards, monitors, printers. and disc readers must be kept in a relatively 


cool environment in order to reduce downtime and maintain operational readiness. 
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Mfany ships are not currently capable of producing such an environment with any 
Bemcisteney--especially in humud climates such as the Persian Gulf, [Indian Ocean, er 
Caribbean Sea. The newer ship classes. however, should not experience as many 
problems because of additional electronics needs being addressed in the ships’ original 
design. Furthermore, the loss of ship’s power could prevent timely access to tmportant 
data. In that case, it would be necessary that a paper copy of such data be stored on 
board. An alternative solution would be to require each major user to have his own 
back-up power source such as an UPS (uninterruptible power supplv) which runs off 
Its own battery pack until a diese! or gas engine is started and begins to produce the 
power source. It ts possible to have an UPS for the entire shipboard computer system 
but rt would require larger battery packs. The decision on how to employ UPS is again 
Strictly a managerral one based on individual ship characteristics and goals. 

Another problem that surfaces involves applications such as personnel or 
disbursing transactions that require constant change or update. Write Once Read 
Many (WORM) optical technology may be the solution in these cases. Other emerging 
technology that may be available in the near future includes erasable optical dises 
Which function in much the same way as a standard floppy disk. The goal of a 
paperless shrp is certainly obtainable rf CD-ROM. ts used in conjunction with other 
elctrontc media such as WORM. However, tn order for thrs to happen. ships must 
maintam a cool operating environment, shipboard portabilitv issues must be resolved, 
and the use of additronal electronic data storage methods to compensate for the CD- 


ROMI's weaknesses must be available and cost effective. 


B. CD-ROM FOR SHORE FACILITIES 
1. Database Design 

iietrce of CD-ROM at U.S. Navy shore facilities must be tailored to fit the 
needs of the particular conymand. The storage and retrieval of massive amounts of 
hrstortcal data is the primary consideration for implementing a CD-ROM system such 
as the TLOCD svstem at NSC Oakland. Database design demands considerable 
attention from factlrtics wishing to cffectively capitalize on the read-only nature of CD- 
feo ietechinology. Of particular concern its the format of the database. CD-RONMI 
databases may consist of a number of files--cach file consisting of similar records 
Meee the same logical format. Since a database from a CID-ROM perspective is a 
collection of simular files concatenated together, a single optical disc may contain many 


distinct databases of different file types. In this case, the TLOCD system actually 


o 


involves three distinet databases--one each for the transaction files. closing balance 
files, and audit trail files. 

When designing a database, attempts should be made to maximize the 
system's storage allocation potential. This consideration was neglected in the TLOCD 
design. Consequently, many of the records in each of its three databases contain data 
common to records in the other two databases. For example, the National Item 
Identification Number (NIIN) and date fields are found in all three record tvpes of the 
TLOCD svstem. This data redundancy across databases should be avoided whenever 
possible in order to achieve a higher level of storage efficiency. 

Care should be taken not to merge separate entities such as the 1G 
databases in an attempt to delete redundant information. Such an attempt could lead 
to wasted space, continued data redundancy, and unwanted loss of valuable data. Note 
Figure 9.2 in which three fictitious hile tables of the [LOCD system are mierocdigema 
single table made up of tuples that represent data records. Notice that there 2am 
entrics in some of the record fields. The space must still be maintained and ts virtually 
wasted. Now notice the data redundancy among the record fields. Furthermore, if a 
record were ever to be damaged or destroved the audit trail data for that date would be 
lost, resulting in an inaccurate historical account of inventory items. That is the reason 
why multiple entities should not be routinely merged into a single table to reduce 
redundancy when designing a database for a particular system. 

-. Cost Effectiveness 

Businesses today are constantly in search of managerial tools and 
manufacturing procedures that reduce overhead and still maintain product reliability. 
The U.S. Navy 1s no different. There are two specific areas in CD-ROM projects such 
as TLOCD where costs could be trimmed. The first such area deals with indexing ie 
total cost for preparing and creating the TLOCD indexes exceeded $9,000 (Lind, 1986. 
p. 59). The Navy may benefit from providing its own indexing and utilizing $9,000 in 
cost savings elsewhere. Any Navy facilitv with sufficient computer hardware can create 
the indexes required for CD-ROM manufacturing, Im miet. there are hardwaiesame 
software units now available that can perform all stages of CD-ROM production 
through the premastering stage. The “CD Publisher” from VideoTools is one such 
product. [lowever, it would be a simple task to assign the job of indexing to a mini- 
computer \ hich could grind out the results in batch mode. The main concern would be 


in deciding the tvpe of index structure to use for the particular application in order to 
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maxinuze performance. Therefore. some knowledge of @D=RO Ni imndexii2 oul 
essential. 

The second area in which costs could be trimmed involves application 
software. The TLOCD application specific software was created at a cost of about 
§4.500. Qualified Navy personnel can create programs to access the 1LOCD database 
using the library of C language functions already resident in the Aey Record \fanager. 
Programmers having experience in a high level language should be able to develop 
sufficient C programming skills within a short time and then produce programs for 
TLOCD and other naval applications. Granted. it is necessary to purchase soliwaie 
such as the Key Record Manager to interface with the CD-ROM file management 
system or else write an independent interface. However, that might not seem very 
prudent since the time and cost to develop and debus such an interiace would Cem, 
prove more costly than an already proven product such as Aey Record \launager which 
has been sold commercially for under $200. Furthermore. such a task would reqmmaeas 
great deal of svstems programming in a language such as C at a time when DoD has 
declared ADA to be the primary language to be utilized in future muilitarv projects. 
Since most CD-ROM access software on the market today is C-language oriented, the 
Navy should direct research toward developing ADA programs to drive CD-ROM 
applications. There are indications from the CD-ROM industry that ADA interfaces 
will be available on the consumer market within a few months. An alternative to this 
approach would be an interface written to accommodute any compiled code 
recognizable in the operating svstem extensions, therefore allowing several different 


compiled languages to access it. 


C. TLOCD PROTOTYPE IMPROVEMENT 
1. Proposed System Modification 

As stated previously, the current TLOCD system) accesses and searches seiiaee 
distinct databases in order to obtain transaction, closing balance, and audit trail 
information for inventory item inquiries. The system should be modified by extracting 
the redundant data from the databases without destrovime the separate cntitiesaen 
relations among the three file types. This could be accomplished by restructuring the 
files. Duplicate data would be removed from the three files and placed in a separate 
table or “NIIN file” which is then linked to the other tables via multiple pointers from 
the NIIN table or via a chaining mechanism from one table to the next. Although the 


number of tables is now increased by one, such an arrangement does not imply 


ineflicienev. The data storage capacity is increased und the tables remain in as separate 
Syeiuics (oO be used for other purposes. This new structure would provide three TLOCD 
lites without duplicate data in such a wavy that the separate entities associated with the 
TLOCD files cach have attributes that apply to that particular entity. Therefore, the 
storage requirement is reduced without removing the idea of separate entities--which 1s 
Gerequirement for TLOCD system control. 

-. Functional Design Issues 

imeudesioning a swstem such as 7LOCD there are three issucs of primary 
Pemecmi. database access, data scarch. and data retrieval. These criteria will now be 
discussed in relationship with the proposed TLOCD modifications. 

Accessing the TLOCD database involves locating and “opening” its index and 
Seeceiies. [he access function must search the CD-ROM database directory for the 
Memeroase thane provided by the user or the users program. The address of a File 
Control Block (FCB) is acquired from the database directory. Tne FCB will contain a 
pointer to a lst of the key record indexes used for searching the database. It also will 
contain a pointer to the beginning address of the actual data on the CD-ROM. This 
“double-pointer” configuration allows the system to search a specified index for a kev 
record value and acquire the relative address of the record within the data file. The 
pointer within the data file is then utilized to locate the record. In this way the integritv 
of the pointers can be maintained and subsequent searches can be conducted relative to 
the current pointer positions. Such an access function requires two paraincters--the 
database name as an input parameter and the database address as an output 
parameter. 

The primary objective of the TLOCD system is to obtain historical data about 
a oarticular NIIN for a specified date. Therefore, the most important fields within the 
seareerecords are the NI{\ and date fickls. The NIIN 1s used to generate a kew revord 
Mees. [he date field is not used as an index yvenerator. It would not provide a 
practical key record index since there could be possibly hundreds or thousands of 
transactions conducted on that particular date. Other ficlds that would generate 
Meeemace key record indexcs include the National Stock Number (NSN) and the 
eieaiet moun name. However, since the FLOCD svstem users deal primarily with the 
les ane seidion) have the need for additional identifiers. no other kev indexes would 


be utilized on a regular basis. 


Normally, indexes are numbered, Sequential and. 1hesusens seaneric dace 
which index he desires to search. Homever, since oly the NIVN imdexs to be creamem 
for the TLOCD svstem modification, no query is needed and the NIIN index ts 
selected by default. The user 1s prompted to enter the NITN and the date if it is knoram 
or desired. The NIIN is located in the index via a balanced tree search. A pointer is 
then followed to a list of date records containing the dates on which the NIIN was 
transacted and the offsets of their associated NIIN records within the file. The dates 
are listed in ascending numerical order according to their Julian equivalents. The 
NIUN record offset is retrieved, record address computed, and the pointer is moved to 
the desired record of the NIIN file. Input parameters for such a search funedem 
include: (1) the database address, (2) the index to be searched, (3) the NIIN, and (4) 
the date. The function will return the record offset in relation to the NIIN file origin. 
if no date is specified, the function will return the offset for the earliest reconaen 
transaction for the specified NIIN. See Figure 9.3 for an iHustrative example. 

Once the record 1s located in the data file its contents must be retrieved and 
displaved for the user. There are various methods that can be used to achieve the task. 
One such method involves the use of a function similar to the “scan” function found in 
the C programming language. In such a technique, the record is treated as a string of 
bytes and the string is “scanned” or read into a buffer. The contents of the buffer are 
then displayed on the screen. In order to make anv sense of the data. other functions 
must be called upon to format the record string into a readable medium. The record 
size must be known so the scan function can determine how many bytes to transfer 
into the buffer. This poses no problem for the TLOCD system since its records are of 
fixed length. However, for variable length records, the scan function would have to be 
designed to look for a length field at the beginning of each record--or else receive the 
information from the search function. Data retrieval can be sinularly executed by string 
manipulation functions commonly found in such programming languages as Pascal and 
ADA. Retrieval programs written in C warrant more consideration due to the 
language’s powerful screen formatting functions. 

3. Other Issues 

No system design can afford to ignore the needs and desires of its user 
environment. Systems that are not user friendly seldom make an impact in the market 
place. Such essential TLOCD user response has indicated dissatisfaction with the 


“page up” and “page down” functions that permit them to move forward or backward 
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within the data file only one record at a time. They would benefit from a= scroll 
function which would allow them to move forward or backward within the file any 
number of records. Such a function would not be hard to implement and would add 
Nextbihity for users. The user would provide an integer (positive or negative) input for 
Mremmuoer Of records he wishes to scroll] over. Since the records are of fixed lIeneth. 
such a function could readily compute the new position of the record in the data file 
and then reposition the pointer to that location. The function would require three 
put parameters: (1) current pointer position, (2) record length, and (3) number of 
records to scroll. [t would pass the new record location as an output parameter. .\n 
attempt to scroll past the beginning or end of the data file would result in retrieval of 


the first or last record tn the file. 
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sAnother issue to be concerned with is the arrangement of data on the terminal 
sereen. The current TLOCD sereen interface displavs a transaction record for a specific 
NITN and then queries the user as to Whether lie Wants to view ad closing balameemen 
audit trail record for the NIIN. Therefore, the user is ayare that he imust dealer 
three separate groups of files. The user has no need to know such information ancdenme 
svstem should make it transparent to him, Furthermore. the screen interface should 
displav data from across all three TLOCD relations upon each NIIN inquiry. The 
result would be a fuller screen with multiple records being used to provide transaction. 
closing balance, and audit trail data about the NIIN. The need no longer exists to 
prompt the user after each NIIN search to query the user about closing balameemen 
audit trai data. 

The design of a user-friendly interface to a svstem is a compiex One aiipeees 
bevond the scope of this thesis. The above exumples serve to illustrate that these issues 


must be carefullv anaivzed to provide user satisfaction. 
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Ne CONCLUSIONS an) RECONIMMENDATIONS 


icemers. va. IS Constantly exploring, experimenting. and seeking new 
technologies in order to maintain a tactical advantage over its adversaries. CD-ROM 
technology warrants immediate attention and funding for implementation and 
applications development. 

CD-ROM applications provide a potentially valuable commodity to the U.S. 
Navy at shore facilities and on board ships at sea. The product is already proven and 
the financial risks are minimal. Major shore facilities should proceed and adopt plans 
Pome overt their permanent and archival databases to CD-ROM applications such as 
wee OCD svstem. The technology is available and is already starting to earn a 
significant niche in the electronic data processing industry. Although an 
implementation reflecting the proposed TLOCD modifications presented in the 
previous chapter cannot be carried out within the scope and time frame of this thesis, it 
Bam oe determined from the information presented that such an implementation is 
plausible and doable within U.S. Navy environments. 

CD-ROM 1s the catalyst that will eventually lead to the first paperless ship. Its 
use In conjunction with other developing electronic technology such as WORM makes 
thie goal reachable. The Navy should designate a ship to funetion as a prototype for 
CD-ROM conversion. The prototype must apply sound database design prineiples 
Siemeas those emphasized in this study in order to produce efficient and effective 
performance. It must also address the functionality of the user interfaces designed for 
each specific application on an independent basis. If these guidelines are followed. the 
CD-ROM applications will produce immediate cost savings and increase efficiency and 
operational readiness by providing faster access to critical data. If current research and 
fe clopiment cannot economically produce a feasible optical storage solution (such as 
Seen Or erasable discs) for constantly changing data. then the chances for a 
“paperless” ship in the near future are greatly reduced. Regardless of that outcome. 
CD-ROM will remain reliable and cost-effective for shipboard use providing proper 


analvsis is conducted prior to system integration. 
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